sobolev training
Sobolev Training for Neural Networks
At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input -- for example when the ground truth function is itself a neural network such as in network compression or distillation. Generally these target derivatives are not computed, or are ignored. This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (2 more...)
First-order Sobolev Reinforcement Learning
Schramm, Fabian, Perrin-Gilbert, Nicolas, Carpentier, Justin
We propose a refinement of temporal-difference learning that enforces first-order Bellman consistency: the learned value function is trained to match not only the Bellman targets in value but also their derivatives with respect to states and actions. By differentiating the Bellman backup through differentiable dynamics, we obtain analytically consistent gradient targets. Incorporating these into the critic objective using a Sobolev-type loss encourages the critic to align with both the value and local geometry of the target function. This first-order TD matching principle can be seamlessly integrated into existing algorithms, such as Q-learning or actor-critic methods (e.g., DDPG, SAC), potentially leading to faster critic convergence and more stable policy gradients without altering their overall structure.
Sobolev Training for Neural Networks
At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input -- for example when the ground truth function is itself a neural network such as in network compression or distillation. Generally these target derivatives are not computed, or are ignored. This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Precise asymptotic analysis of Sobolev training for random feature models
Fisher, Katharine E, Li, Matthew TC, Marzouk, Youssef, Schorlepp, Timo
Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training -- regression with both function and gradient data -- on the generalization error of highly overparameterized predictive models in high dimensions. In this paper, we obtain a precise characterization of this training modality for random feature (RF) models in the limit where the number of trainable parameters, input dimensions, and training data tend proportionally to infinity. Our model for Sobolev training reflects practical implementations by sketching gradient data onto finite dimensional subspaces. By combining the replica method from statistical physics with linearizations in operator-valued free probability theory, we derive a closed-form description for the generalization errors of the trained RF models. For target functions described by single-index models, we demonstrate that supplementing function data with additional gradient data does not universally improve predictive performance. Rather, the degree of overparameterization should inform the choice of training method. More broadly, our results identify settings where models perform optimally by interpolating noisy function and gradient data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.13)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (2 more...)
Sobolev acceleration for neural networks
Oh, Jong Kwon, Lyu, Hanbaek, Son, Hwijae
Sobolev training, which integrates target derivatives into the loss functions, has been shown to accelerate convergence and improve generalization compared to conventional $L^2$ training. However, the underlying mechanisms of this training method remain only partially understood. In this work, we present the first rigorous theoretical framework proving that Sobolev training accelerates the convergence of Rectified Linear Unit (ReLU) networks. Under a student-teacher framework with Gaussian inputs and shallow architectures, we derive exact formulas for population gradients and Hessians, and quantify the improvements in conditioning of the loss landscape and gradient-flow convergence rates. Extensive numerical experiments validate our theoretical findings and show that the benefits of Sobolev training extend to modern deep learning tasks.
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Italy (0.04)
- (3 more...)
Sobolev Training of End-to-End Optimization Proxies
Rosemberg, Andrew W., Garcia, Joaquim Dias, Bent, Russell, Van Hentenryck, Pascal
Optimization proxies - machine learning models trained to approximate the solution mapping of parametric optimization problems in a single forward pass - offer dramatic reductions in inference time compared to traditional iterative solvers. This work investigates the integration of solver sensitivities into such end to end proxies via a Sobolev training paradigm and does so in two distinct settings: (i) fully supervised proxies, where exact solver outputs and sensitivities are available, and (ii) self supervised proxies that rely only on the objective and constraint structure of the underlying optimization problem. By augmenting the standard training loss with directional derivative information extracted from the solver, the proxy aligns both its predicted solutions and local derivatives with those of the optimizer. Under Lipschitz continuity assumptions on the true solution mapping, matching first order sensitivities is shown to yield uniform approximation error proportional to the training set covering radius. Empirically, different impacts are observed in each studied setting. On three large Alternating Current Optimal Power Flow benchmarks, supervised Sobolev training cuts mean squared error by up to 56 percent and the median worst case constraint violation by up to 400 percent while keeping the optimality gap below 0.22 percent. For a mean variance portfolio task trained without labeled solutions, self supervised Sobolev training halves the average optimality gap in the medium risk region (standard deviation above 10 percent of budget) and matches the baseline elsewhere. Together, these results highlight Sobolev training whether supervised or self supervised as a path to fast reliable surrogates for safety critical large scale optimization workloads.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)
A Hybrid Virtual Element Method and Deep Learning Approach for Solving One-Dimensional Euler-Bernoulli Beams
Enabe, Paulo Akira F., Provasi, Rodrigo
A hybrid framework integrating the Virtual Element Method (VEM) with deep learning is presented as an initial step toward developing efficient and flexible numerical models for one-dimensional Euler-Bernoulli beams. The primary aim is to explore a data-driven surrogate model capable of predicting displacement fields across varying material and geometric parameters while maintaining computational efficiency. Building upon VEM's ability to handle higher-order polynomials and non-conforming discretizations, the method offers a robust numerical foundation for structural mechanics. A neural network architecture is introduced to separately process nodal and material-specific data, effectively capturing complex interactions with minimal reliance on large datasets. To address challenges in training, the model incorporates Sobolev training and GradNorm techniques, ensuring balanced loss contributions and enhanced generalization. While this framework is in its early stages, it demonstrates the potential for further refinement and development into a scalable alternative to traditional methods. The proposed approach lays the groundwork for advancing numerical and data-driven techniques in beam modeling, offering a foundation for future research in structural mechanics.